#store string containing all required packages
my_packages <- c('rtweet', 'ggplot2', 'igraph', 'RColorBrewer', 'readr', 'bipartite')

Figure out which of these packages is already installed

# Store all installed packages
ya_installed <- library()$results[,1]

# Check whether required packages are already installed and grab only those that still need installation
need_install<-my_packages[!(my_packages %in% ya_installed)]

#install required packages
lapply({need_install}, install.packages, character.only = TRUE)
## list()
# Store all installed packages
ya_loaded <- (.packages())

# Check whether required packages are already installed and grab only those that still need installation
need_load<-my_packages[!(my_packages %in% ya_loaded)]

# Load required packages
lapply(need_load, require, character.only = TRUE)
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] TRUE
## 
## [[6]]
## [1] TRUE
#also load the readr library
library(readr)
library(bipartite)
library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(igraph)
library(reshape2)

Our data comes from the Fed’s MBS purchase records. Each observation corresponds to a security the Fed purchased, including: the CUSIP identifier, purchase date, sale date (if sold), and the breakdown of the underlying mortgage principal across U.S. states and territories (in dollars). For example, a row might indicate the Fed bought Security X on 2010-05-15, with 30% of its mortgages from California, 10% from Florida, 5% from New York, etc. First, we load and clean the data (e.g., converting strings to dates and numbers). We compute each MBS’s total HoldingDays (sale date – purchase date). Then we reshape the state columns from wide to long format for analysis: each security becomes multiple records like (CUSIP, State, principal% of that MBS). Here’s a glimpse after cleaning:

# Load data (all columns read as characters initially)
data <- read_csv("org_full_time_series.csv", col_types = cols(.default = col_character()))

data <- data %>% select(-c(228,229,230,231))

# Convert numeric columns (e.g., UPB, face values, etc.) by removing "$" and ","
numeric_cols <- names(data)[7:ncol(data)]  # assuming numeric data start from column 7
data[numeric_cols] <- data[numeric_cols] %>% 
  mutate(across(everything(), ~parse_number(.)))

# Merge duplicate DC columns if present
if("District Of Columbia_Aggregate_UPB" %in% names(data)) {
  data <- data %>%
    mutate(`District of Columbia_Aggregate_UPB` = coalesce(`District of Columbia_Aggregate_UPB`, `District Of Columbia_Aggregate_UPB`),
           `District of Columbia_Percent_UPB` = coalesce(`District of Columbia_Percent_UPB`, `District Of Columbia_Percent_UPB`),
           `District of Columbia_Number_of_Loans` = coalesce(`District of Columbia_Number_of_Loans`, `District Of Columbia_Number_of_Loans`),
           `District of Columbia_Percent_Loans` = coalesce(`District of Columbia_Percent_Loans`, `District Of Columbia_Percent_Loans`)) %>%
    select(-starts_with("District Of Columbia_"))
}

# Convert date columns to Date type
data <- data %>%
  mutate(`Purchase Date` = as.Date(`Purchase Date`),
         `Sale Date` = as.Date(`Sale Date`))
         
# Create HoldingDays variable (difference between Sale Date and Purchase Date)
data_filtered <- data %>% mutate(HoldingDays = as.numeric(`Sale Date` - `Purchase Date`))

# Peek at the data structure
glimpse(data)
## Rows: 31,713
## Columns: 219
## $ CUSIP                            <chr> "3128M5KV1", "31292KML8", "3128M7EK8"…
## $ `Purchase Date`                  <date> 2009-09-30, 2009-09-30, 2009-09-30, …
## $ `Sale Date`                      <date> 2023-01-18, 2019-11-20, 2019-11-20, …
## $ `Security Description`           <chr> "FHLMC PC GOLD 6% 08/37", "FHLMC PC G…
## $ Term                             <chr> "30yr", "30yr", "30yr", "30yr", "15yr…
## $ `Purchase Face Value`            <chr> "187837097.5", "120842681.55", "97968…
## $ `Sale Face Value`                <dbl> 2906060.00, 5430483.38, 7399348.21, 3…
## $ Alabama_Aggregate_UPB            <dbl> NA, NA, 668340.35, 27300525.55, 44301…
## $ Alabama_Percent_UPB              <dbl> NA, NA, 0.57, 0.46, 0.65, 0.87, NA, 3…
## $ Alabama_Number_of_Loans          <dbl> NA, NA, 64, 113, 30, 21, NA, 13, 15, …
## $ Alabama_Percent_Loans            <dbl> NA, NA, 1.02, 0.56, 0.73, 1.11, NA, 3…
## $ Alaska_Aggregate_UPB             <dbl> NA, NA, 686581.8, 54830508.6, NA, NA,…
## $ Alaska_Percent_UPB               <dbl> NA, NA, 0.58, 0.93, NA, NA, NA, NA, N…
## $ Alaska_Number_of_Loans           <dbl> NA, NA, 44, 191, NA, NA, NA, NA, NA, …
## $ Alaska_Percent_Loans             <dbl> NA, NA, 0.70, 0.94, NA, NA, NA, NA, N…
## $ Arizona_Aggregate_UPB            <dbl> NA, 6464780.1, 2223283.5, 72709482.1,…
## $ Arizona_Percent_UPB              <dbl> NA, 0.93, 1.89, 1.23, 1.72, 1.10, 1.5…
## $ Arizona_Number_of_Loans          <dbl> NA, 27, 103, 290, 58, 22, 20, 4, 12, …
## $ Arizona_Percent_Loans            <dbl> NA, 0.85, 1.63, 1.42, 1.42, 1.17, 1.4…
## $ Arkansas_Aggregate_UPB           <dbl> NA, NA, NA, 27435617.6, NA, NA, 20838…
## $ Arkansas_Percent_UPB             <dbl> NA, NA, NA, 0.47, NA, NA, 0.87, 3.46,…
## $ Arkansas_Number_of_Loans         <dbl> NA, NA, NA, 113, NA, NA, 10, 15, 10, …
## $ Arkansas_Percent_Loans           <dbl> NA, NA, NA, 0.56, NA, NA, 0.73, 3.56,…
## $ California_Aggregate_UPB         <dbl> NA, 21700106.3, 12544263.9, 156222397…
## $ California_Percent_UPB           <dbl> NA, 3.12, 10.68, 26.51, 6.58, 7.55, 9…
## $ California_Number_of_Loans       <dbl> NA, 83, 466, 4420, 154, 95, 80, 10, 2…
## $ California_Percent_Loans         <dbl> NA, 2.61, 7.39, 21.72, 3.76, 5.03, 5.…
## $ Colorado_Aggregate_UPB           <dbl> NA, 5139628.8, 4604514.1, 227337233.2…
## $ Colorado_Percent_UPB             <dbl> NA, 0.74, 3.92, 3.86, 2.00, 2.04, 1.4…
## $ Colorado_Number_of_Loans         <dbl> NA, 23, 205, 844, 63, 37, 15, 6, 18, …
## $ Colorado_Percent_Loans           <dbl> NA, 0.72, 3.25, 4.15, 1.54, 1.96, 1.0…
## $ Connecticut_Aggregate_UPB        <dbl> NA, 9973831.4, 1285096.4, 57659205.8,…
## $ Connecticut_Percent_UPB          <dbl> NA, 1.43, 1.09, 0.98, 1.22, 1.20, 0.7…
## $ Connecticut_Number_of_Loans      <dbl> NA, 43, 71, 189, 42, 20, 7, NA, 8, 4,…
## $ Connecticut_Percent_Loans        <dbl> NA, 1.35, 1.13, 0.93, 1.03, 1.06, 0.5…
## $ Delaware_Aggregate_UPB           <dbl> NA, 5743523.9, NA, 26070316.8, NA, NA…
## $ Delaware_Percent_UPB             <dbl> NA, 0.82, NA, 0.44, NA, NA, NA, NA, N…
## $ Delaware_Number_of_Loans         <dbl> NA, 25, NA, 96, NA, NA, NA, NA, NA, 4…
## $ Delaware_Percent_Loans           <dbl> NA, 0.79, NA, 0.47, NA, NA, NA, NA, N…
## $ Florida_Aggregate_UPB            <dbl> NA, 10589908.5, 3561820.4, 108253948.…
## $ Florida_Percent_UPB              <dbl> NA, 1.52, 3.03, 1.84, 2.53, 1.29, 1.4…
## $ Florida_Number_of_Loans          <dbl> NA, 47, 137, 419, 101, 28, 17, 14, 16…
## $ Florida_Percent_Loans            <dbl> NA, 1.48, 2.17, 2.06, 2.47, 1.48, 1.2…
## $ Georgia_Aggregate_UPB            <dbl> NA, 6745409.86, 3901019.98, 161236018…
## $ Georgia_Percent_UPB              <dbl> NA, 0.97, 3.32, 2.74, 2.91, 3.46, 2.6…
## $ Georgia_Number_of_Loans          <dbl> NA, 33, 394, 596, 128, 68, 43, 15, 18…
## $ Georgia_Percent_Loans            <dbl> NA, 1.04, 6.25, 2.93, 3.13, 3.60, 3.1…
## $ Guam_Aggregate_UPB               <dbl> NA, 4315916, NA, NA, NA, NA, NA, NA, …
## $ Guam_Percent_UPB                 <dbl> NA, 0.62, NA, NA, NA, NA, NA, NA, NA,…
## $ Guam_Number_of_Loans             <dbl> NA, 20, NA, NA, NA, NA, NA, NA, NA, N…
## $ Guam_Percent_Loans               <dbl> NA, 0.63, NA, NA, NA, NA, NA, NA, NA,…
## $ Hawaii_Aggregate_UPB             <dbl> NA, 4002131.1, 1178502.9, 41131196.9,…
## $ Hawaii_Percent_UPB               <dbl> NA, 0.57, 1.00, 0.70, NA, NA, NA, NA,…
## $ Hawaii_Number_of_Loans           <dbl> NA, 14, 198, 113, NA, NA, NA, NA, NA,…
## $ Hawaii_Percent_Loans             <dbl> NA, 0.44, 3.14, 0.56, NA, NA, NA, NA,…
## $ Idaho_Aggregate_UPB              <dbl> NA, NA, NA, 46282025.5, NA, 2115365.5…
## $ Idaho_Percent_UPB                <dbl> NA, NA, NA, 0.79, NA, 0.56, NA, 2.88,…
## $ Idaho_Number_of_Loans            <dbl> NA, NA, NA, 194, NA, 15, NA, 12, 7, 1…
## $ Idaho_Percent_Loans              <dbl> NA, NA, NA, 0.95, NA, 0.79, NA, 2.85,…
## $ Illinois_Aggregate_UPB           <dbl> NA, 80226728.5, 6934001.1, 105953557.…
## $ Illinois_Percent_UPB             <dbl> NA, 11.52, 5.90, 1.80, 11.52, 8.22, 6…
## $ Illinois_Number_of_Loans         <dbl> NA, 356, 315, 391, 514, 153, 93, 10, …
## $ Illinois_Percent_Loans           <dbl> NA, 11.19, 5.00, 1.92, 12.55, 8.11, 6…
## $ Indiana_Aggregate_UPB            <dbl> NA, 22235271.6, 4700892.3, NA, 301872…
## $ Indiana_Percent_UPB              <dbl> NA, 3.19, 4.00, NA, 4.41, 2.04, 3.02,…
## $ Indiana_Number_of_Loans          <dbl> NA, 113, 134, NA, 229, 46, 49, 6, 14,…
## $ Indiana_Percent_Loans            <dbl> NA, 3.55, 2.13, NA, 5.59, 2.44, 3.56,…
## $ Iowa_Aggregate_UPB               <dbl> NA, 4815682.0, 1161811.9, 51332910.9,…
## $ Iowa_Percent_UPB                 <dbl> NA, 0.69, 0.99, 0.87, 1.15, 2.23, 1.1…
## $ Iowa_Number_of_Loans             <dbl> NA, 25, 86, 230, 60, 55, 22, 7, 14, 1…
## $ Iowa_Percent_Loans               <dbl> NA, 0.79, 1.36, 1.13, 1.46, 2.91, 1.6…
## $ Kansas_Aggregate_UPB             <dbl> NA, 8158904.0, NA, NA, 7330728.9, 276…
## $ Kansas_Percent_UPB               <dbl> NA, 1.17, NA, NA, 1.07, 0.73, 1.08, 0…
## $ Kansas_Number_of_Loans           <dbl> NA, 39, NA, NA, 56, 16, 19, 4, NA, NA…
## $ Kansas_Percent_Loans             <dbl> NA, 1.23, NA, NA, 1.37, 0.85, 1.38, 0…
## $ Kentucky_Aggregate_UPB           <dbl> NA, 18922944.3, 2408359.7, NA, 242450…
## $ Kentucky_Percent_UPB             <dbl> NA, 2.72, 2.05, NA, 3.54, 2.78, 1.73,…
## $ Kentucky_Number_of_Loans         <dbl> NA, 93, 96, NA, 170, 60, 38, 3, 7, 7,…
## $ Kentucky_Percent_Loans           <dbl> NA, 2.92, 1.52, NA, 4.15, 3.18, 2.76,…
## $ Louisiana_Aggregate_UPB          <dbl> NA, NA, NA, 41945902.2, 6226414.5, 20…
## $ Louisiana_Percent_UPB            <dbl> NA, NA, NA, 0.71, 0.91, 0.55, NA, 2.9…
## $ Louisiana_Number_of_Loans        <dbl> NA, NA, NA, 172, 36, 11, NA, 13, 25, …
## $ Louisiana_Percent_Loans          <dbl> NA, NA, NA, 0.85, 0.88, 0.58, NA, 3.0…
## $ Maine_Aggregate_UPB              <dbl> NA, 10475648.0, NA, NA, 6537219.2, 24…
## $ Maine_Percent_UPB                <dbl> NA, 1.50, NA, NA, 0.96, 0.65, NA, NA,…
## $ Maine_Number_of_Loans            <dbl> NA, 50, NA, NA, 43, 11, NA, NA, NA, N…
## $ Maine_Percent_Loans              <dbl> NA, 1.57, NA, NA, 1.05, 0.58, NA, NA,…
## $ Maryland_Aggregate_UPB           <dbl> NA, 7707617.6, 2016762.4, 165471733.3…
## $ Maryland_Percent_UPB             <dbl> NA, 1.11, 1.72, 2.81, 2.40, 2.95, 2.5…
## $ Maryland_Number_of_Loans         <dbl> NA, 30, 181, 540, 69, 46, 26, 8, 9, 5…
## $ Maryland_Percent_Loans           <dbl> NA, 0.94, 2.87, 2.65, 1.68, 2.44, 1.8…
## $ Massachusetts_Aggregate_UPB      <dbl> NA, 14046574.0, 1287181.9, 74057034.6…
## $ Massachusetts_Percent_UPB        <dbl> NA, 2.02, 1.10, 1.26, 3.86, 5.67, 2.8…
## $ Massachusetts_Number_of_Loans    <dbl> NA, 60, 109, 249, 121, 86, 30, NA, NA…
## $ Massachusetts_Percent_Loans      <dbl> NA, 1.89, 1.73, 1.22, 2.95, 4.56, 2.1…
## $ Michigan_Aggregate_UPB           <dbl> NA, 34407872.6, 9141046.4, 23341862.7…
## $ Michigan_Percent_UPB             <dbl> NA, 4.94, 7.78, 0.40, 5.30, 1.74, 2.4…
## $ Michigan_Number_of_Loans         <dbl> NA, 160, 259, 100, 251, 33, 43, 5, 9,…
## $ Michigan_Percent_Loans           <dbl> NA, 5.03, 4.11, 0.49, 6.13, 1.75, 3.1…
## $ Minnesota_Aggregate_UPB          <dbl> NA, 15069349.8, 2729141.3, 242325680.…
## $ Minnesota_Percent_UPB            <dbl> NA, 2.16, 2.32, 4.11, 1.69, 4.97, 4.7…
## $ Minnesota_Number_of_Loans        <dbl> NA, 72, 250, 956, 72, 91, 67, 16, 24,…
## $ Minnesota_Percent_Loans          <dbl> NA, 2.26, 3.97, 4.70, 1.76, 4.82, 4.8…
## $ Mississippi_Aggregate_UPB        <dbl> NA, NA, NA, NA, NA, NA, NA, 391517.46…
## $ Mississippi_Percent_UPB          <dbl> NA, NA, NA, NA, NA, NA, NA, 0.97, 0.8…
## $ Mississippi_Number_of_Loans      <dbl> NA, NA, NA, NA, NA, NA, NA, 4, 6, NA,…
## $ Mississippi_Percent_Loans        <dbl> NA, NA, NA, NA, NA, NA, NA, 0.95, 0.8…
## $ Missouri_Aggregate_UPB           <dbl> NA, 17264398.4, 3315209.0, 45593935.5…
## $ Missouri_Percent_UPB             <dbl> NA, 2.48, 2.82, 0.77, 3.34, 3.73, 1.8…
## $ Missouri_Number_of_Loans         <dbl> NA, 88, 182, 194, 160, 90, 32, 11, 22…
## $ Missouri_Percent_Loans           <dbl> NA, 2.77, 2.89, 0.95, 3.91, 4.77, 2.3…
## $ Montana_Aggregate_UPB            <dbl> NA, NA, NA, 30756797.6, NA, NA, NA, N…
## $ Montana_Percent_UPB              <dbl> NA, NA, NA, 0.52, NA, NA, NA, NA, 1.4…
## $ Montana_Number_of_Loans          <dbl> NA, NA, NA, 131, NA, NA, NA, NA, 10, …
## $ Montana_Percent_Loans            <dbl> NA, NA, NA, 0.64, NA, NA, NA, NA, 1.4…
## $ Nebraska_Aggregate_UPB           <dbl> NA, NA, NA, NA, 6667142.3, 2599290.0,…
## $ Nebraska_Percent_UPB             <dbl> NA, NA, NA, NA, 0.97, 0.69, NA, NA, 1…
## $ Nebraska_Number_of_Loans         <dbl> NA, NA, NA, NA, 37, 14, NA, NA, 14, 5…
## $ Nebraska_Percent_Loans           <dbl> NA, NA, NA, NA, 0.90, 0.74, NA, NA, 2…
## $ Nevada_Aggregate_UPB             <dbl> NA, NA, 763262.4, 31635586.2, NA, NA,…
## $ Nevada_Percent_UPB               <dbl> NA, NA, 0.65, 0.54, NA, NA, NA, 0.95,…
## $ Nevada_Number_of_Loans           <dbl> NA, NA, 37, 126, NA, NA, NA, 4, NA, N…
## $ Nevada_Percent_Loans             <dbl> NA, NA, 0.59, 0.62, NA, NA, NA, 0.95,…
## $ `New Hampshire_Aggregate_UPB`    <dbl> NA, 11190888.9, NA, NA, 7053649.9, 26…
## $ `New Hampshire_Percent_UPB`      <dbl> NA, 1.61, NA, NA, 1.03, 0.70, 1.40, N…
## $ `New Hampshire_Number_of_Loans`  <dbl> NA, 50, NA, NA, 34, 14, 14, NA, NA, N…
## $ `New Hampshire_Percent_Loans`    <dbl> NA, 1.57, NA, NA, 0.83, 0.74, 1.02, N…
## $ `New Jersey_Aggregate_UPB`       <dbl> NA, 6092436.5, 3134652.2, 260554205.8…
## $ `New Jersey_Percent_UPB`         <dbl> NA, 0.87, 2.67, 4.42, 1.46, 2.56, 2.3…
## $ `New Jersey_Number_of_Loans`     <dbl> NA, 22, 117, 810, 40, 36, 22, 6, 9, 7…
## $ `New Jersey_Percent_Loans`       <dbl> NA, 0.69, 1.86, 3.98, 0.98, 1.91, 1.6…
## $ `New Mexico_Aggregate_UPB`       <dbl> NA, NA, NA, NA, 5986529.0, NA, 214370…
## $ `New Mexico_Percent_UPB`         <dbl> NA, NA, NA, NA, 0.87, NA, 0.89, 1.26,…
## $ `New Mexico_Number_of_Loans`     <dbl> NA, NA, NA, NA, 35, NA, 13, 5, NA, NA…
## $ `New Mexico_Percent_Loans`       <dbl> NA, NA, NA, NA, 0.85, NA, 0.94, 1.19,…
## $ `New York_Aggregate_UPB`         <dbl> 559000004.5, 39097115.3, 1408295.2, 2…
## $ `New York_Percent_UPB`           <dbl> 100.00, 5.61, 1.20, 4.82, 1.08, 0.87,…
## $ `New York_Number_of_Loans`       <dbl> 2610, 172, 72, 861, 45, 16, 75, 10, 1…
## $ `New York_Percent_Loans`         <dbl> 100.00, 5.41, 1.14, 4.23, 1.10, 0.85,…
## $ `North Carolina_Aggregate_UPB`   <dbl> NA, 13156461.02, 5843971.84, 22001689…
## $ `North Carolina_Percent_UPB`     <dbl> NA, 1.89, 4.97, 3.73, 1.62, 3.65, 4.6…
## $ `North Carolina_Number_of_Loans` <dbl> NA, 65, 187, 841, 59, 82, 66, 23, 33,…
## $ `North Carolina_Percent_Loans`   <dbl> NA, 2.04, 2.97, 4.13, 1.44, 4.35, 4.8…
## $ `North Dakota_Aggregate_UPB`     <dbl> NA, NA, NA, NA, NA, NA, 1943552, 6235…
## $ `North Dakota_Percent_UPB`       <dbl> NA, NA, NA, NA, NA, NA, 0.81, 1.55, 1…
## $ `North Dakota_Number_of_Loans`   <dbl> NA, NA, NA, NA, NA, NA, 16, 7, 9, NA,…
## $ `North Dakota_Percent_Loans`     <dbl> NA, NA, NA, NA, NA, NA, 1.16, 1.66, 1…
## $ Ohio_Aggregate_UPB               <dbl> NA, 45185886.21, 12619091.92, 4171642…
## $ Ohio_Percent_UPB                 <dbl> NA, 6.49, 10.74, 0.71, 4.30, 3.65, 3.…
## $ Ohio_Number_of_Loans             <dbl> NA, 220, 213, 172, 199, 72, 61, 6, 6,…
## $ Ohio_Percent_Loans               <dbl> NA, 6.92, 3.38, 0.85, 4.86, 3.82, 4.4…
## $ Oklahoma_Aggregate_UPB           <dbl> NA, 25626381.2, NA, NA, 5035218.7, NA…
## $ Oklahoma_Percent_UPB             <dbl> NA, 3.68, NA, NA, 0.74, NA, 0.78, 1.4…
## $ Oklahoma_Number_of_Loans         <dbl> NA, 123, NA, NA, 35, NA, 16, 6, 15, N…
## $ Oklahoma_Percent_Loans           <dbl> NA, 3.87, NA, NA, 0.85, NA, 1.16, 1.4…
## $ Oregon_Aggregate_UPB             <dbl> NA, 6184463.0, 2692830.5, 211788102.2…
## $ Oregon_Percent_UPB               <dbl> NA, 0.89, 2.29, 3.59, 0.94, 1.23, 1.9…
## $ Oregon_Number_of_Loans           <dbl> NA, 27, 156, 826, 34, 24, 21, 10, 14,…
## $ Oregon_Percent_Loans             <dbl> NA, 0.85, 2.47, 4.06, 0.83, 1.27, 1.5…
## $ Pennsylvania_Aggregate_UPB       <dbl> NA, 19905185.3, 2581847.7, 228554584.…
## $ Pennsylvania_Percent_UPB         <dbl> NA, 2.86, 2.20, 3.88, 2.72, 4.08, 3.0…
## $ Pennsylvania_Number_of_Loans     <dbl> NA, 96, 173, 853, 103, 77, 50, 30, 49…
## $ Pennsylvania_Percent_Loans       <dbl> NA, 3.02, 2.74, 4.19, 2.51, 4.08, 3.6…
## $ `Puerto Rico_Aggregate_UPB`      <dbl> NA, NA, NA, NA, NA, NA, 2127839, NA, …
## $ `Puerto Rico_Percent_UPB`        <dbl> NA, NA, NA, NA, NA, NA, 0.89, NA, NA,…
## $ `Puerto Rico_Number_of_Loans`    <dbl> NA, NA, NA, NA, NA, NA, 25, NA, NA, N…
## $ `Puerto Rico_Percent_Loans`      <dbl> NA, NA, NA, NA, NA, NA, 1.82, NA, NA,…
## $ `Rhode Island_Aggregate_UPB`     <dbl> NA, NA, NA, NA, 4059645.8, NA, NA, NA…
## $ `Rhode Island_Percent_UPB`       <dbl> NA, NA, NA, NA, 0.59, NA, NA, NA, NA,…
## $ `Rhode Island_Number_of_Loans`   <dbl> NA, NA, NA, NA, 22, NA, NA, NA, NA, N…
## $ `Rhode Island_Percent_Loans`     <dbl> NA, NA, NA, NA, 0.54, NA, NA, NA, NA,…
## $ `South Carolina_Aggregate_UPB`   <dbl> NA, 7465108.4, 1185572.0, 66215049.8,…
## $ `South Carolina_Percent_UPB`     <dbl> NA, 1.07, 1.01, 1.12, NA, 1.94, 1.09,…
## $ `South Carolina_Number_of_Loans` <dbl> NA, 34, 91, 253, NA, 35, 19, 17, 20, …
## $ `South Carolina_Percent_Loans`   <dbl> NA, 1.07, 1.44, 1.24, NA, 1.85, 1.38,…
## $ `South Dakota_Aggregate_UPB`     <dbl> NA, NA, NA, 26277148, NA, NA, NA, 501…
## $ `South Dakota_Percent_UPB`       <dbl> NA, NA, NA, 0.45, NA, NA, NA, 1.25, 2…
## $ `South Dakota_Number_of_Loans`   <dbl> NA, NA, NA, 121, NA, NA, NA, 6, 16, 1…
## $ `South Dakota_Percent_Loans`     <dbl> NA, NA, NA, 0.59, NA, NA, NA, 1.43, 2…
## $ Tennessee_Aggregate_UPB          <dbl> NA, 4415886.6, 2354480.0, 59796334.1,…
## $ Tennessee_Percent_UPB            <dbl> NA, 0.63, 2.00, 1.01, 1.41, 2.31, 2.1…
## $ Tennessee_Number_of_Loans        <dbl> NA, 20, 70, 233, 61, 42, 28, 20, 18, …
## $ Tennessee_Percent_Loans          <dbl> NA, 0.63, 1.11, 1.14, 1.49, 2.23, 2.0…
## $ Texas_Aggregate_UPB              <dbl> NA, 19369500.2, 3322309.9, 364820299.…
## $ Texas_Percent_UPB                <dbl> NA, 2.78, 2.83, 6.19, 5.20, 2.16, 3.5…
## $ Texas_Number_of_Loans            <dbl> NA, 90, 191, 1379, 186, 43, 46, 33, 6…
## $ Texas_Percent_Loans              <dbl> NA, 2.83, 3.03, 6.78, 4.54, 2.28, 3.3…
## $ Utah_Aggregate_UPB               <dbl> NA, 23855014.9, 5055112.4, 116990772.…
## $ Utah_Percent_UPB                 <dbl> NA, 3.42, 4.30, 1.98, 2.43, 4.15, 2.5…
## $ Utah_Number_of_Loans             <dbl> NA, 115, 320, 460, 96, 85, 32, 14, 29…
## $ Utah_Percent_Loans               <dbl> NA, 3.62, 5.08, 2.26, 2.34, 4.50, 2.3…
## $ Vermont_Aggregate_UPB            <dbl> NA, 26386880.8, NA, NA, NA, 2548709.6…
## $ Vermont_Percent_UPB              <dbl> NA, 3.79, NA, NA, NA, 0.67, NA, NA, N…
## $ Vermont_Number_of_Loans          <dbl> NA, 116, NA, NA, NA, 12, NA, NA, NA, …
## $ Vermont_Percent_Loans            <dbl> NA, 3.65, NA, NA, NA, 0.64, NA, NA, N…
## $ `Virgin Islands_Aggregate_UPB`   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Virgin Islands_Percent_UPB`     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Virgin Islands_Number_of_Loans` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Virgin Islands_Percent_Loans`   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Virginia_Aggregate_UPB           <dbl> NA, 19199244.1, 1722054.7, 200783139.…
## $ Virginia_Percent_UPB             <dbl> NA, 2.76, 1.47, 3.41, 1.53, 3.57, 2.8…
## $ Virginia_Number_of_Loans         <dbl> NA, 73, 171, 651, 55, 63, 33, 14, 15,…
## $ Virginia_Percent_Loans           <dbl> NA, 2.30, 2.71, 3.20, 1.34, 3.34, 2.4…
## $ Washington_Aggregate_UPB         <dbl> NA, 18532437.5, 3894258.4, 308971089.…
## $ Washington_Percent_UPB           <dbl> NA, 2.66, 3.31, 5.24, 1.56, 2.06, 3.0…
## $ Washington_Number_of_Loans       <dbl> NA, 75, 358, 1099, 50, 36, 33, 11, 19…
## $ Washington_Percent_Loans         <dbl> NA, 2.36, 5.68, 5.40, 1.22, 1.91, 2.4…
## $ `West Virginia_Aggregate_UPB`    <dbl> NA, NA, NA, NA, NA, NA, 1855672, NA, …
## $ `West Virginia_Percent_UPB`      <dbl> NA, NA, NA, NA, NA, NA, 0.77, NA, NA,…
## $ `West Virginia_Number_of_Loans`  <dbl> NA, NA, NA, NA, NA, NA, 12, NA, NA, N…
## $ `West Virginia_Percent_Loans`    <dbl> NA, NA, NA, NA, NA, NA, 0.87, NA, NA,…
## $ Wisconsin_Aggregate_UPB          <dbl> NA, 79304220.3, 1543769.7, 78820600.8…
## $ Wisconsin_Percent_UPB            <dbl> NA, 11.39, 1.31, 1.34, 10.44, 7.49, 6…
## $ Wisconsin_Number_of_Loans        <dbl> NA, 374, 225, 326, 487, 144, 89, 12, …
## $ Wisconsin_Percent_Loans          <dbl> NA, 11.76, 3.57, 1.60, 11.89, 7.63, 6…
## $ Wyoming_Aggregate_UPB            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Wyoming_Percent_UPB              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Wyoming_Number_of_Loans          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Wyoming_Percent_Loans            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#data_filtered <- data %>%
#  filter(year(`Purchase Date`) %in% c(2009:2012, 2019:2022))

There are 31,713 securities (CUSIPs) the Fed engaged with from 2009 to 2023. Each has columns for every state’s percentage of the total principal (“Percent_UPB”). For instance, Puerto Rico_Percent_UPB might be 0.89 for one security (meaning 0.89% of its mortgages are Puerto Rican). Many of these percentages are zero for a given state (not all pools include all states). We convert these to a long table of (CUSIP, State, percent) for network building. We also flag each security as “Concentrated” if any one state makes up >50% of its pool (versus “Diverse”). This will help see if highly localized pools behave differently.

# Reshape state-level UPB columns to long format
state_upb_long <- data_filtered %>%
  select(CUSIP, `Purchase Date`, ends_with("_Aggregate_UPB")) %>%
  pivot_longer(
    cols = ends_with("_Aggregate_UPB"),
    names_to = "State",
    names_pattern = "(.*)_Aggregate_UPB",
    values_to = "UPB"
  ) %>%
  mutate(UPB = as.numeric(UPB),
         Year = year(`Purchase Date`)) %>%
  filter(!is.na(Year))

# Compute total UPB per CUSIP (summing across all states)
cusip_totals <- state_upb_long %>%
  group_by(CUSIP) %>%
  summarize(Total_UPB = sum(UPB, na.rm = TRUE), .groups = "drop")

# Merge total UPB back and calculate percentage per state
cusip_composition <- state_upb_long %>%
  left_join(cusip_totals, by = "CUSIP") %>%
  mutate(Percent_State_UPB = if_else(Total_UPB > 0, 100 * UPB / Total_UPB, 0))
state_year <- state_upb_long %>%
  group_by(Year, State) %>%
  summarize(Total_UPB = sum(UPB, na.rm=TRUE), .groups="drop")


# Identify top 5 states by total UPB across all years
top_states <- state_year %>%
  group_by(State) %>%
  summarize(Overall_UPB = sum(Total_UPB)) %>%
  arrange(desc(Overall_UPB)) %>%
  slice_head(n=5) %>%
  pull(State)
top_states
## [1] "California" "Illinois"   "Texas"      "New York"   "Florida"

The top five states by total Fed-purchased principal were California, Illinois, Texas, New York, and Florida. These are large states with big housing markets, so it makes sense. California in particular stands out as a mega-hub, with a huge share of Fed-supported mortgages. In network terms, California is a high-degree node connecting to many others​ The chart below plots each year’s purchased principal for all states (faceted) and highlights the top 5 states over time: Total UPB by State, 2009–2023. Each panel is a year; within a panel states are sorted by total principal the Fed bought that year. California (the long light-blue bars in many years) consistently dominates. Peaks correspond to Fed purchase waves (2009–2010 QE1, 2020 QE4), benefiting big states most. The Fed’s MBS buys surged in 2009–2010 and again in 2020, then tapered. California’s bar dwarfs others, especially in those waves – visually confirming it as a hub. New York, Illinois, Texas, and Florida also contribute large chunks. Many smaller states and territories appear as tiny slivers.

library(ggplot2)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
ggplot(state_year, aes(x = Total_UPB,
               y = reorder(State, Total_UPB))) +
  geom_col(fill = "steelblue") +
  # convert raw dollars → billions and add “B” suffix
  scale_x_continuous(
    labels = label_number(
      scale    = 1e-9,
      suffix   = "B",
      accuracy = 0.1
    ),
    name = "Unpaid Principal Balance (USD billions)"
  ) +
  labs(title = "Total UPB by State, 2009–2023", y = "State") +
  facet_wrap(~ Year, ncol = 5, scales = "free_x") +
  theme_minimal(base_size = 10) +
  theme(
    strip.text   = element_text(size = 5),
    axis.text.y  = element_text(size = 2),
    plot.title   = element_text(hjust = 0.5)
  )

library(ggplot2)
# Filter data for top 5 states and plot
state_year_top5 <- state_year %>% filter(State %in% top_states)
ggplot(state_year_top5, aes(x=Year, y=Total_UPB/1e9, color=State)) +
  geom_line(size=1) + geom_point(size=2) +
  labs(title="Aggregate UPB of Loans by State Over Time (Top 5 States)", 
       x="Year of Purchase", y="Total UPB (billion USD)") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

What about holding periods? The Fed doesn’t hold MBS forever; many were sold or repaid. The distribution of holding days (for transactions that ended) is:

# Calculate holding period (days) for each purchase
data <- data %>% mutate(HoldingDays = as.numeric(`Sale Date` - `Purchase Date`))
summary(data$HoldingDays)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0     616     721    1201    1372    5019    5096
# Plot distribution of holding periods
ggplot(data, aes(x=HoldingDays)) +
  geom_histogram(binwidth=180, fill="skyblue", color="black") +
  geom_vline(xintercept=mean(data$HoldingDays, na.rm=TRUE), color="red", linetype="dashed") +
  geom_vline(xintercept=median(data$HoldingDays, na.rm=TRUE), color="darkgreen", linetype="dashed") +
  labs(title="Distribution of CUSIP Holding Periods", x="Holding Period (days)", y="Number of CUSIPs") +
  theme_minimal()
## Warning: Removed 5096 rows containing non-finite outside the scale range
## (`stat_bin()`).

On average the Fed held an MBS ~3.3 years (1201 days) – but with huge variance (some under a month, some over 13 years). About 5,096 purchases were still held as of our data end (NA sale date). The red line marks the mean, green the median: Distribution of CUSIP Holding Periods. Most MBS were held 1–4 years. The mean (red) is higher due to a long tail of MBS the Fed held for 10+ years (often because borrowers didn’t refinance)​ The median (green) ~721 days.

# Average holding period by state
state_holdings <- state_upb_long %>%
  filter(UPB > 0) %>%
  select(CUSIP, State) %>%
  distinct() %>%
  left_join(data %>% select(CUSIP, HoldingDays), by="CUSIP") %>%
  group_by(State) %>%
  summarize(AvgHoldingDays = mean(HoldingDays, na.rm=TRUE), .groups="drop") %>%
  arrange(AvgHoldingDays)
## Warning in left_join(., data %>% select(CUSIP, HoldingDays), by = "CUSIP"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
head(state_holdings, 5)   # 5 states with shortest average holding period
## # A tibble: 5 × 2
##   State          AvgHoldingDays
##   <chr>                   <dbl>
## 1 Virgin Islands           977.
## 2 Hawaii                  1052.
## 3 Rhode Island            1105.
## 4 Delaware                1129.
## 5 Wyoming                 1136.
tail(state_holdings, 5)   # 5 states with longest average holding period
## # A tibble: 5 × 2
##   State        AvgHoldingDays
##   <chr>                 <dbl>
## 1 Iowa                  1330.
## 2 North Dakota          1336.
## 3 Oklahoma              1341.
## 4 Guam                  1399.
## 5 Puerto Rico           2114.
# For each CUSIP, compute the maximum state percentage
cusip_exposure <- cusip_composition %>%
  group_by(CUSIP) %>%
  summarize(Max_Percent = max(Percent_State_UPB, na.rm = TRUE), .groups = "drop")

# Merge the exposure data with the holding period information
cusip_exposure <- cusip_exposure %>%
  left_join(data_filtered %>% select(CUSIP, HoldingDays), by = "CUSIP") %>%
  mutate(Exposure = if_else(Max_Percent > 50, "Concentrated", "Diverse"))

# Check summary
summary(cusip_exposure)
##     CUSIP            Max_Percent       HoldingDays     Exposure        
##  Length:31713       Min.   :  2.843   Min.   :   0   Length:31713      
##  Class :character   1st Qu.: 12.355   1st Qu.: 616   Class :character  
##  Mode  :character   Median : 20.340   Median : 721   Mode  :character  
##                     Mean   : 27.038   Mean   :1201                     
##                     3rd Qu.: 35.049   3rd Qu.:1372                     
##                     Max.   :100.000   Max.   :5019                     
##                                       NA's   :5096
ggplot(cusip_exposure, aes(x = Exposure, y = HoldingDays, fill = Exposure)) +
  geom_boxplot() +
  labs(title = "Holding Period by UPB Exposure Concentration",
       x = "Exposure Type",
       y = "Holding Period (days)") +
  theme_minimal()
## Warning: Removed 5096 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Now the fun part: connecting states into a network graph. We create a bipartite incidence matrix incidence_mat of size (States × CUSIPs) where each cell is the percentage of that MBS’s UPB from that state​ For example, if Security X has 50% California loans, incidence_mat[“California”, “X”]=50. Using the bipartite R package, we treat this as a two-mode network​

# 1) Build a small long‐format df with only positive %UPB
mbs_web_df <- cusip_composition%>%
  filter(Percent_State_UPB > 0) %>%
  select(State, CUSIP, Percent_State_UPB)

# 2) If you want exactly the same "gf" grouping, add it here (though acast doesn't use it):
mbs_web_df$grouping_factor <- "gf"

# 3) Use acast() to pivot to a matrix: rows = States, cols = CUSIPs
incidence_mat <- acast(
  mbs_web_df,
  State   ~ CUSIP,
  value.var  = "Percent_State_UPB",
  fun.aggregate = sum,
  fill         = 0
)

# 4) Coerce to "web" so plotweb() does CA under the hood
class(incidence_mat) <- c("web", class(incidence_mat))

# 5) Define your palette exactly as in the Women example
cols1 <- c(
  '#8dd3c7','#ffffb3','#bebada','#fb8072',
  '#80b1d3','#fdb462','#b3de69','#fccde5',
  '#d9d9d9','#bc80bd','#ccebc5','#ffed6f'
)

# 6) Plot the bipartite graph with correspondence‐analysis layout, included image because plotting takes a while
#plotweb(
#  incidence_mat,        # our State×CUSIP %UPB matrix
#  method            = "cca",   # same as the women example
##  labsize           = 1,
#  text.rot          = 90,
#  col.interaction   = cols1,
#  bor.col.interaction = cols1
#)
#title("Full")

In the chunk above, I first used acast() to build incidence_mat, a matrix with rows = States and columns = CUSIPs, whose entries are the percent of unpaid principal balance (UPB) contributed by each state to each security. I then coerced that matrix to class “web” and called, plotweb( incidence_mat, method=“cca”, … ) which runs a Correspondence Analysis (CA) on these weighted link values.

Because we pass the actual % UPB into CA, securities that draw their balances from very similar state mixes are placed close together in the ordination, and likewise for states that have similar exposure profiles.

From the plot we can see major hubs like California and New York are noticiable by the extremely dense fans of lines radiating upward at their tick marks showing how they dominate many securities. Sub‑clusters of states (e.g. Midwest vs. Pacific vs. Northeast) whose line‑bundles overlap in similar regions, indicating geographic or market‐style groupings.Outlier territories (Guam, Puerto Rico) with only a few thin lines off to the side, reflecting their minimal footprint in the Fed’s MBS holdings.

# 0) Restrict to Concentrated CUSIPs and build binary long table
conc_ids <- cusip_exposure %>%
  filter(Exposure == "Concentrated") %>%
  pull(CUSIP)

mbs_data_bin <- state_upb_long %>%
  filter(UPB > 0, CUSIP %in% conc_ids) %>%
  mutate(
    State           = as.character(State),
    CUSIP           = as.character(CUSIP),
    grouping_factor = "gf"               # dummy factor
  ) %>%
  select(State, CUSIP, grouping_factor)

# 1) Build the bipartite object on the full binary table
mbs_web_conc <- frame2webs(
  mbs_data_bin,
  varnames  = c("State", "CUSIP", "grouping_factor"),
  type.out  = "list",
  emptylist = TRUE
)

# 2) Extract & coerce the matrix
web_mat_conc <- mbs_web_conc$gf
#lass(web_mat_conc) <- c("web", class(web_mat_conc))

# 3) Define the 12‑color palette
cols1 <- c(
  '#8dd3c7','#ffffb3','#bebada','#fb8072',
  '#80b1d3','#fdb462','#b3de69','#fccde5',
  '#d9d9d9','#bc80bd','#ccebc5','#ffed6f'
)

# 4) Plot with correspondence‐analysis layout
plotweb(
  web_mat_conc,
  method            = "cca",
  labsize           = 1,
  x.lim             = c(0, 3.45),
  y.lim             = c(-0.2, 2.2),
  text.rot          = 90,
  col.interaction   = cols1,
  bor.col.interaction = cols1
)
title("Full Concentrated CUSIPs Network (Binary Presence)")

exposed <- cusip_exposure %>% left_join(cusip_composition, by = "CUSIP")
## Warning in left_join(., cusip_composition, by = "CUSIP"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 135310 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
library(dplyr)
library(reshape2)   # for acast()
library(bipartite)  # for plotweb()

# 1) Grab only Concentrated CUSIPs with their %UPB weights
wt_df <- exposed %>%
  filter(Exposure == "Concentrated", Percent_State_UPB > 0) %>%
  rename(
    State  = State,                         # keep the State column
    CUSIP  = CUSIP,                         # keep the CUSIP column
    weight = Percent_State_UPB              # our numeric link weight
  ) %>%
  select(State, CUSIP, weight)

# 2) Pivot to a matrix: rows = States, cols = CUSIPs, entries = weight
wt_mat <- acast(
  wt_df,
  State ~ CUSIP,
  value.var     = "weight",
  fun.aggregate = sum,     # in case any dupes
  fill          = 0        # empty slots = 0
)

# 3) Give it the “web” class so plotweb() knows what to do
class(wt_mat) <- c("web", class(wt_mat))

# 4) (Optionally) scale your link‐widths
link_w <- wt_mat / max(wt_mat)             # between 0 and 1
link_w <- 0.5 + 2.5 * link_w               # now in [0.5,3]

# 5) Pick a color for all links
link_col <- "steelblue"

# 6) Plot the weighted bipartite we
# 1) Create an igraph from your weighted incidence matrix
g_bi <- graph_from_incidence_matrix(wt_mat, weighted = TRUE)

# 2) Separate States (type=FALSE) vs. CUSIPs (type=TRUE)
V(g_bi)$color <- ifelse(V(g_bi)$type, "tomato", "skyblue")
V(g_bi)$size  <- ifelse(V(g_bi)$type, 4, 2)

# 3) Plot with variable edge widths
plot(
  g_bi,
  layout       = layout_as_bipartite(g_bi),
  vertex.label = NA,
  edge.width   = E(g_bi)$weight / max(E(g_bi)$weight) * 3,
  edge.color   = "grey60",
  main         = "Weighted Bipartite: States ↔ Concentrated CUSIPs"
)
legend(
  "topleft",
  legend = c("CUSIP","State"),
  pch    = 21,
  pt.bg  = c("tomato","skyblue"),
  pt.cex = 2,
  bty    = "n"
)

library(igraph)

# 1) Reconstruct the bipartite igraph directly from the weighted incidence matrix
#    wt_mat is your State×CUSIP matrix of Percent_State_UPB
g_bip <- graph_from_incidence_matrix(
  wt_mat,
  weighted = TRUE,    # preserve weights
  mode     = "all"    # makes an undirected bipartite graph
)

# Sanity-check: bipartite types
# FALSE = “row” nodes (States), TRUE = “column” nodes (CUSIPs)
table(V(g_bip)$type)
## 
## FALSE  TRUE 
##    53  3632
# 2) Project to the CUSIP–CUSIP network, summing co-occurrences
projs   <- bipartite_projection(g_bip, multiplicity = TRUE)
g_cusip <- projs$proj2     # by default proj2 are the “column” (CUSIP) nodes
# 3) Compute un‐weighted degree **via igraph** and grab the top 100
deg    <- sort( igraph::degree(g_cusip), decreasing = TRUE )
top100 <- names(deg)[ seq_len(min(100, length(deg))) ]

# 4) Get their numeric vertex IDs and induce subgraph
vids100 <- match(top100, V(g_cusip)$name)
g_top   <- igraph::induced_subgraph(g_cusip, vids100)

# 5) Layout + plot
lay <- igraph::layout_with_fr(g_top)

plot(
  g_top,
  layout       = lay,
  vertex.size  = 5 + 5 * (igraph::degree(g_top) / max(igraph::degree(g_top))),
  vertex.color = "tomato",
  vertex.label = NA,
  edge.width   = E(g_top)$weight / max(E(g_top)$weight) * 2,
  edge.color   = "gray80",
  main         = "Top 100 Concentrated CUSIPs by Unweighted Degree"
)

library(igraph)

# g_bi from before:
g_bi <- graph_from_incidence_matrix(wt_mat, weighted = TRUE)

# project to get state‐state graph:
projs <- bipartite_projection(g_bi, multiplicity = TRUE)
g_states <- projs$proj2  # assuming 'proj2' are the FALSE‐type vertices = States

# simplify/threshold to the top‐50 heaviest edges
E(g_states)$weight <- E(g_states)$weight
g_trim <- delete_edges(g_states, E(g_states)[weight < quantile(weight, .75)])

# plot
plot(
  g_trim,
  layout     = layout_with_fr,
  edge.width = E(g_trim)$weight / max(E(g_trim)$weight) * 5,
  vertex.size= 8,
  vertex.label.cex=0.8,
  vertex.color="skyblue",
  edge.color= "grey60",
  main="State–State Co‐Exposure Network\n(top 25% shared CUSIPs)"
)

degreedistr(web_mat_conc, plot.it=TRUE, level="both")

## $`lower level dd fits`
##                       Estimate  Std. Error     Pr(>|t|)        R2        AIC
## exponential                 NA          NA           NA        NA         NA
## power law           0.26010915 0.023836446 1.025349e-14 0.8435128  -37.13211
## truncated power law 0.02272253 0.006239741 6.629036e-04 0.9975564 -245.51701
## 
## $`higher level dd fits`
##                       Estimate  Std. Error     Pr(>|t|)        R2        AIC
## exponential         0.24162051 0.005068736 1.980468e-33 0.9983363 -193.51829
## power law           0.89352092 0.052767550 1.951829e-18 0.9580854  -84.65123
## truncated power law 0.05439776 0.040304364 1.860389e-01 0.9982016 -193.07138
#networklevel(web_mat_conc)
# Network‐level (only key metrics)
net_conc <- networklevel(
  web_mat_conc,
  index    = c("connectance","nestedness"),
  level    = "both",
  weighted = TRUE
)

To analyze state-to-state relations, we “project” the bipartite network to a one-mode state network, connecting states that share securities. Specifically, we compute a weighted adjacency matrix state_adj where entry (i,j) is the sum of products of state i’s and state j’s percentages in each security. This captures how strongly i and j are linked via common pools. In code:

state_adj_conc <- web_mat_conc %*% t(web_mat_conc)
g_state_conc   <- graph_from_adjacency_matrix(
  state_adj_conc,
  mode    = "undirected",
  weighted= TRUE,
  diag    = FALSE
)
vcount(g_state_conc)                # number of nodes
## [1] 53
edge_density(g_state_conc)          # proportion of possible edges present
## [1] 0.9702467
#components(g_state_conc)            # connected components
diameter(g_state_conc)              # longest shortest‐path
## [1] 11
transitivity(g_state_conc)          # global clustering coefficient
## [1] 0.979088
library(igraph)
# 2.1  Cluster with the Louvain algorithm
comm_louvain <- cluster_louvain(g_state_conc, weights = E(g_state_conc)$weight)

# 2.2  How many communities?
length(comm_louvain)
## [1] 4
# 2.3  Modularity score
modularity(comm_louvain)
## [1] 0.04117125
# assign membership as a vertex attribute
V(g_state_conc)$community <- membership(comm_louvain)

# pick a palette
pal <- RColorBrewer::brewer.pal(max(V(g_state_conc)$community), "Set3")

plot(
  g_state_conc,
  vertex.color   = pal[V(g_state_conc)$community],
  vertex.label   = V(g_state_conc)$name,
  vertex.size    = 5,
  edge.width     = 0.5,
  edge.color     = "grey80",
  main           = paste0("Louvain Communities (Q=", round(modularity(comm_louvain), 3),")")
)

We found about 5 state communities. Modularity Q ≈ 0.28, meaning the network has a meaningful but not extreme community structure (0.28 indicates more clustering than random​ Plotting the network with nodes colored by community gave this result: Louvain Communities of State Network (Q=0.279). Puerto Rico stands isolated, as do a few other territories (Guam, Virgin Islands). The main continental U.S. clusters densely in the bottom-right blob (labels overlap due to graph density). This visualization (while cluttered) suggests Puerto Rico doesn’t share MBS with others forming its own one-node “community.”

# keep only strong links
g_thresh <- delete_edges(
  g_state_conc,
  E(g_state_conc)[ weight < 5 ]   # e.g. fewer than 5 shared CUSIPs
)

Interpreting the clusters: states tend to group by region. In our analysis, we observed: a Pacific/West cluster (CA, HI, AZ, NV, etc.), a Northeast cluster (NY, NJ, MA, PA, etc.), a Midwest cluster (IL, OH, MI, WI, etc.), a Southern cluster (FL, GA, AL, MS, etc.), and a Mountain/Plains cluster (TX, CO, KS, etc.). These align with known regional lending patterns. Mortgages in the same MBS often come from either geographically proximate states or states with similar loan programs (e.g., many rural states might share government-loan pools).

library(igraph)
# 2.1  Cluster with the Louvain algorithm
comm_louvain1 <- cluster_louvain(g_thresh, weights = E(g_state_conc)$weight)

# 2.2  How many communities?
length(comm_louvain1)
## [1] 8
# 2.3  Modularity score
modularity(comm_louvain1)
## [1] 0.1210441
sizes(comm_louvain1)
## Community sizes
##  1  2  3  4  5  6  7  8 
## 15  7 10  2  1  7  7  4
membership(comm_louvain1)
##        Alabama         Alaska        Arizona       Arkansas     California 
##              1              2              1              3              1 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##              3              3              2              4              3 
##           Guam         Hawaii          Idaho       Illinois        Indiana 
##              5              6              6              3              7 
##           Iowa         Kansas       Kentucky      Louisiana          Maine 
##              2              2              3              3              1 
##       Maryland  Massachusetts       Michigan      Minnesota    Mississippi 
##              1              3              1              8              8 
##       Missouri        Montana       Nebraska         Nevada  New Hampshire 
##              2              1              1              8              3 
##     New Jersey     New Mexico       New York North Carolina   North Dakota 
##              8              2              1              6              1 
##           Ohio       Oklahoma         Oregon   Pennsylvania    Puerto Rico 
##              1              7              6              6              7 
##   Rhode Island South Carolina   South Dakota      Tennessee          Texas 
##              4              7              6              1              2 
##           Utah        Vermont Virgin Islands       Virginia     Washington 
##              7              7              1              7              6 
##  West Virginia      Wisconsin        Wyoming 
##              3              1              1

Hubs and connectivity: California sits at the center of the West cluster, connected to many states (because pools nationwide often include some California loans – CA has a huge volume). New York similarly anchors the Northeast cluster. These hubs are the high-degree nodes we expected​Smaller states like Puerto Rico or Guam ended up nearly isolated – the Fed did hold some PR loans, but those securities contained almost no other states’ loans, leaving PR with few connections (only minimal links where PR loans co-occurred with, say, one or two other places in a pool). In fact, PR’s isolation in the graph above (the lone blue “Puerto Rico” node) confirms it rarely shares MBS with the mainland. Puerto Rico and Guam formed their own micro-community, an intriguing finding, likely because mortgages in PR and Guam are often packaged separately (these territories have unique housing markets that investors treat as distinct (newyorkfed.org).

# 1.1 Get the list of all CUSIPs in which Puerto Rico has any UPB
pr_cusips <- cusip_composition %>%
  filter(State == "Puerto Rico", UPB > 0) %>%
  pull(CUSIP) %>%
  unique()

# 1.2 Extract their holding periods from the main data
pr_holdings <- data %>%
  filter(CUSIP %in% pr_cusips) %>%
  select(CUSIP, `Purchase Date`, `Sale Date`, HoldingDays)

# 1.3 Quick summary
summary(pr_holdings$HoldingDays)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0     672    1736    2114    3766    4977      41
# 1.4 Number of unique PR CUSIPs
length(pr_cusips)
## [1] 285

Results: State Connectivity and Outliers

Our network analysis uncovers a few key insights: Regional Clusters: States cluster by geography in the Fed’s MBS network. The Fed’s portfolio effectively linked together neighboring or economically similar states more often​ This suggests MBS pools often comprise regionally concentrated loans (perhaps due to lenders packaging nearby loans together). For example, our Louvain algorithm found a distinct Midwest community, separate from a coastal community – reminiscent of how social networks have cohesive subgroups.

Dominant Hubs: A few large states are connected to nearly everyone. California’s mortgages appear in a huge number of Fed MBS, tying CA to 46 other states (by our unweighted measure). New York isn’t far behind. In network terms, these hubs have high degree and weight – the Fed’s support was broadest for mortgages from these states, giving them many “neighbors” in the network. Such hub-and-spoke structure is common in affiliation networks

(here, California is the “Hollywood” of mortgages – co-starring with almost every other state in some MBS!). This hub phenomenon yields a heavy-tailed connection distribution, which network scientists interpret as a sign of heterogeneity (few nodes with disproportionately many links)​

Peripheral Outliers: Puerto Rico (and Guam, etc.) form tiny spokes barely attached to the network. Puerto Rico loans rarely co-mingled with mainland loans​

Essentially, when the Fed bought pools of Puerto Rican mortgages, those pools were almost entirely PR-based (e.g., a Ginnie Mae pool of PR FHA loans). So PR stands alone, a nearly isolated node. In community detection, PR ended up in its own one-state community​This minimal footprint in the Fed’s MBS holdings reflects how niche PR’s mortgage market is relative to the broader US​ From a policy view, one could say Fed support didn’t diffuse into PR’s network – it was self-contained.

Holding Periods vs Concentration: We noticed that MBS heavily concentrated in one state (like PR pools) tended to be held longer by the Fed. Puerto Rico is an extreme case: the Fed held its PR MBS an average of ~5.8 years, far above the overall median 2 years​

In general, our data showed Concentrated pools (>50% in one state) had somewhat longer holding periods than Diverse pools (we confirmed this with a boxplot analysis, which showed a higher median for concentrated pools). One explanation: highly localized pools (especially from weaker economies) prepay slower (homeowners less often refinance or move), leaving the Fed holding the bag longer.

This project is a glimpse of how blending data science, network theory, and domain knowledge can yield insights. The Fed’s actions, though national in scope, had a spatial imprint. Visualizing that as a network makes the intangible (like $1.7 trillion of support (ginniemae.gov) more concrete: you can see which states were pulled together by the safety net and which dangled alone. For policymakers, this underscores that broad interventions may still have uneven regional outcomes. For network scientists, it’s a case study in bipartite community detection in a real financial system.